Skip to content

Add observation.metadata.choice_nodes #4422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jun 8, 2025

Conversation

tybug
Copy link
Member

@tybug tybug commented May 31, 2025

Closes (most of) #4387

To do:

  • document the choice_nodes format (nonstandard nan patterns, large ints, etc)
  • document observability.OBSERVABILITY_CHOICE_NODES
  • work out a proper @settings for this? (as a follow-up?)

I'm also combining this with changes to BuildContext (for hypofuzz), so that this PR can be "the entirety of the changes necessary to make HypofuzzProvider work". I've split this out and have dropped it from this PR

@tybug tybug force-pushed the observability-choice-nodes branch from d82561e to a02dbac Compare June 4, 2025 00:48
Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a concrete use-case in mind for the node constraints? If not, how about b64-encoded bytes format for the choice sequence? It'd be a fair bit smaller, and it looks like the human-readable version is already complicated enough that outside parties will want to use our code to deal with that.

Either way, I was thinking that we could have a flat list of [start, end, was_discarded, label] values, derived from the Spans class to track the tree structure of the input.

Comment on lines 57 to 60
"interesting_origin": {
"type": ["string", "null"],
"description": "The internal InterestingOrigin object for failing tests, if and only if ``status == \"failed\"``. The ``traceback`` string value is derived from this object."
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we already store this in the top-level status_reason key.

[goes and checks] aaaaaannnd, we mishandle non-failing cases, oops.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, INVALID and OVERRUN get collapsed together (which is probably-good as a library agnostic format)

Copy link
Member Author

@tybug tybug Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I thought this was status, not status_reason. The actual reason I split this out is because hypofuzz wants the InterestingOrigin object itself, not the str(origin) version in status_reason.

[e: and this means the type here is wrong, fixing..]

[e2: nope, it is a string because we're documenting the file format here, not the in-memory format. Hmm, unfortunate that the two are similar but different...]

@tybug
Copy link
Member Author

tybug commented Jun 4, 2025

Do we have a concrete use-case in mind for the node constraints? If not, how about b64-encoded bytes format for the choice sequence? It'd be a fair bit smaller, and it looks like the human-readable version is already complicated enough that outside parties will want to use our code to deal with that.

Concrete use case is passing in-memory to hypofuzz, mostly for ordering corpus elements, cmd+f "choice_nodes" in https://github.com/Zac-HD/hypofuzz/pull/113/files.

I don't have a consumer of the serialized file format in mind, but my thinking is "if you're in a position to be using hypothesis utilities to transform the observations, you should just hook OBSERVABILITY_CALLBACKS and work with it in-memory (or serialize it yourself in your own format to a file). And if you're not in a position to use hypothesis utilities (working in another language etc), you'll greatly benefit from a human-readable format than having to replicate our choices_from_bytes function"

@Zac-HD
Copy link
Member

Zac-HD commented Jun 4, 2025

Yeah, OK, fair enough. Feels inelegant but I think moving fast at the cost of inelegant formats is the right call for now.

@tybug tybug force-pushed the observability-choice-nodes branch from e6d9c62 to 6dc159d Compare June 4, 2025 07:12
@tybug
Copy link
Member Author

tybug commented Jun 4, 2025

[start, end, was_discarded, label

I went with [str(span.label), span.start, span.end, span.discarded]

Comment on lines +287 to 291
# We should only be sending observability reports for datas that have finished
# being modified.
assert data.frozen

if status_reason is not None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data needs to be frozen to access data.spans, and at that point it seems reasonable to require that all datas passed here are frozen. I don't think freezing more than we used to at any of the callers is harmful, and anyway the whole point of .freeze is a phase-transition at which point it cannot be modified, which we definitely expect if we're reporting an observation for a data.

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nearly done, I think

Comment on lines 165 to 166
Hypothesis Metadata
^^^^^^^^^^^^^^^^^^^
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's

  • move the information-message schema first
  • make this a subheading of the test-case schema
  • consider pulling out the choice_nodes and choice_spans to a second subheading and collapsing <details> by default (it's a lot of very-niche-interest text)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure sphinx has collapsible elements unless we install some third party extension 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a cute trick, the built-in only directive allows for arbitrary html:

.. only:: html

   .. raw:: html

      <details>
      <summary>Click to expand</summary>

Put whatever you want here, and in the HTML build only it'll be inside a details tag!

.. only:: html

   .. raw:: html

      </details>

Copy link
Member Author

@tybug tybug Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to collapse for now, but TBH I think observability might deserve its own page at this point. (I think ghostwriter does as well). IMO a selling point of big api reference pages is that you don't have to worry about things like niche content or collapsing text

(which also makes cmd+f harder) TIL cmd+f autoexpands <details>

but still, collapsing makes it easy to miss, and this is a pretty important section for the right subset of people

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, good argument, leave it uncollapsed and we'll move it later.

@Zac-HD Zac-HD merged commit 26b75d5 into HypothesisWorks:master Jun 8, 2025
60 checks passed
@tybug tybug deleted the observability-choice-nodes branch June 9, 2025 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants